Re-TACRED: Addressing Shortcomings of the TACRED Dataset

نویسندگان

چکیده

TACRED is one of the largest and most widely used sentence-level relation extraction datasets. Proposed models that are evaluated using this dataset consistently set new state-of-the-art performance. However, they still exhibit large error rates despite leveraging external knowledge unsupervised pretraining on text corpora. A recent study suggested may be due to poor quality. The observed over 50% challenging sentences from development test sets incorrectly labeled account for an average drop 8% f1-score in model was limited a small biased sample 5k (out total 106k) sentences, substantially restricting generalizability broader implications its findings. In paper, we address these shortcomings by: (i) performing comprehensive whole dataset, (ii) proposing improved crowdsourcing strategy deploying it re-annotate (iii) thorough analysis understand how correcting annotations affects previously published results. After verification, 23.9% labels incorrect. Moreover, evaluating several our revised yields improvement 14.3% helps uncover significant relationships between different (rather than simply offsetting or scaling their scores by constant factor). Finally, aside also release Re-TACRED, completely re-annotated version can perform reliable evaluation models.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A TACRED Data Collection and Validation

TACRED leverages the work done selecting query entities and annotating system responses in the TAC KBP evaluations. In each year of the TAC KBP evaluation (2009–2015), 100 query entities are given to participating KBP systems with the aim of filling in valid knowledge base entries for these entities. Our annotation effort re-uses these query entities, annotating each sentence in the source corp...

متن کامل

Addressing the Shortcomings of One-Way Chains

One-way hash chains have been the preferred choice, over the symmetric and asymmetric key cryptography, in security setups where efficiency mattered; despite the ephemeral confidentiality and authentication they assure. Known constructions of one-way chains (for example, SHA-1 based), only ensure the forward secrecy and have limitations over their length i.e., a priori knowledge of chain’s leng...

متن کامل

Cross Dataset Person Re-identification

Until now, most existing researches on person re-identification aim at improving the recognition rate on single dataset setting. The training data and testing data of these methods are form the same source. Although they have obtained high recognition rate in experiments, they usually perform poorly in practical applications. In this paper, we focus on the cross dataset person re-identification...

متن کامل

Addressing the shortcomings of three recent Bayesian methods for detecting interspecific recombination in DNA sequence alignments.

We address a potential shortcoming of three probabilistic models for detecting interspecific recombination in DNA sequence alignments: the multiple change-point model (MCP) of Suchard et al. (2003), the dual multiple change-point model (DMCP) of Minin et al. (2005), and the phylogenetic factorial hidden Markov model (PFHMM) of Husmeier (2005). These models are based on the Bayesian paradigm, wh...

متن کامل

Position-aware Attention and Supervised Data Improve Slot Filling

Organized relational knowledge in the form of “knowledge graphs” is important for many applications. However, the ability to populate knowledge bases with facts automatically extracted from documents has improved frustratingly slowly. This paper simultaneously addresses two issues that have held back prior work. We first propose an effective new model, which combines an LSTM sequence model with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i15.17631